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Abstract 

Stimulus from the environment that guides behavior and informs decisions is encoded in 
. the firing rates of neural populations. Each neuron in the populations, however, does not spike 

independently: spike events are correlated from cell to cell. To what degree does this apparent 
redundancy impact the accuracy with which decisions can be made, and the computations that 
are required to optimally decide? We explore these questions for two illustrative models of 
Q" 1 correlation among cells. Each model is statistically identical at the level of pairs cells, but 

differs in higher-order statistics that describe the simultaneous activity of larger cell groups. 
We find that the presence of correlations can diminish the performance attained by an ideal 
1 decision maker to either a small or large extent, depending on the nature of the higher-order 

>> \ ■ interactions. Moreover, while this optimal performance can in some cases be obtained via 

" the standard intcgration-to-bound operation, in others it requires a nonlinear computation on 

incoming spikes. Overall, we conclude that a given level of pairwise correlations-even when 
restricted to identical neural populations-may not always indicate redundancies that diminish 
Cr** . decision making performance. 



1 Introduction 



^ ■ Sensory information is often encoded in irregularly spiking neural populations. One well-studied 

example is given by direction-selective cells in area MT, whose firing rates depend on the degree 
and direction of coherent motion in the visual field [10} \30\ EJ [35] • Individual neurons in MT - as 



in many other brain areas - exhibit noisy and variable spiking [30], as can be modeled by Poisson 
point processes |38[ 140] . Moreover, this variable spiking is generally not independent from cell to 
cell. Returning to our example, a number of studies have measured pairwise correlations in MT 
during direction discrimination tasks as well as smooth-pursuit eye movements |24[ [31 l4"6j [T4"] ; while 
this measurement is a subtle endeavor experimentally, a number of studies suggest a value near 
p ~ .1 — .15 ( |13| summarizes these observations, for a number of brain areas.) 

What are the consequences of correlated spike variability for the speed and accuracy of sensory 
decisions? The role of pairwise correlations in stimulus encoding has been the subject of many 
prior studies |34 [ I26 [ 12]. The results are rich, showing that correlations can have positive, negative, 
or neutral effects on levels of encoded information. The present study serves to extend this body 
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of work in two ways. First, as done in a different context by [191 129] . we contrast the impact of 
correlations that have the same pairwise level but a different structure at higher orders. 

Second, as in |14tH]. we consider the impact of correlations on decisions that unfold over time, by 
combining a sequence of samples observed over time in the sensory populations. A classical example 
that we will use to describe and motivate our studies is the moving dots direction discrimination 
task. Here, a fraction of dots in a visual display move coherently in a given direction, while 
the remainder display random motion; the task is to identify the direction from two possible 
alternatives. Decisions become increasingly accurate as subjects take (or are given) longer to make 
the decision. 

In analyzing decisions that develop over time, we utilize a central result from sequential analysis. 
This is the Sequential Probability Ratio Test (SPRT) \A2\ 12 lj . which linearly sums the log-odds of 
independent observations from a sampling distribution until a predetermined evidence threshold is 
reached. The SPRT is the optimal statistical test in that it gives the minimum expected number 
of samples for a required level of accuracy in deciding among two task alternatives. 

We pose two related questions based on the SPRT. First, how does the presence of correlated 
spiking in the sampled pools impact the speed and accuracy of decisions produced by the SPRT? 
Our focus is on how the structure of population-wide correlations determines the answer. Second, 
how does the presence of correlated spiking impact the computations that are necessary to perform 
the SPRT? This question is intriguing, because the SPRT may be performed via the simple, linear 
computation of integrating spikes over time and across the populations for a surprisingly broad 
class of inputs, including independent Poisson spike trains (45J [6]. Thus, in this setting optimal 
decisions can be made by integrator circuits [6l [22| [T2] . Our goal here is to determine whether and 
when this continues to hold true for correlated neural populations. 

We answer these questions for two illustrative models of correlated, Poissonian spiking. We 
emphasize that the spikes that these models produce are indistinguishable at the level of both single 
cells and pairs of cells. However, they differ in higher-order correlations, in that they can only be 
distinguished by examining the statistics of three or more neurons. In the first model, correlations 
are introduced via shared spike events across the entire pool. In this case optimal inference via the 
SPRT produces fast and accurate decisions, but depends on a nonlinear computation. As a result, 
the simpler computation of spike integration requires, on average, longer times to reach the same 
level of accuracy. In contrast, when shared spiking events are more frequent but are common to 
fewer neurons within a pool, performance under the SPRT is significantly diminished. However, 
in this case both SPRT and spike integration perform comparably, so a linear computation can 
produce decisions that are close to optimal. 

2 Models of evidence accumulation and encoding 
2.1 Model neural populations and the decision task 

We begin by introducing the notation for the two decision making models that will be compared. In 
this study we consider the case of discrimination between two alternatives, and therefore model two 
populations of neurons that encode the strength of evidence for each alternative. Returning to the 
moving dots task for illustration, each population could be the set of MT cells that are selective for 
motion in a given direction. Here, the firing rates in each population represents the dot motion C 
via their firing rates X p and \ n ; here the subscripts indicate the "preferred" and "null" populations, 
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which correspond to the motion direction of the visual stimulus versus the alternate direction. In 
this way, the firing rate of neurons encoding the preferred direction will be higher than the null 
direction, X p > X n . Following [33] (see also [28\ UOj). we model this relationship as linear: 

A p = 40 + .4CHz (1) 
X n = 40-.4CHz. (2) 

Throughout the text we consider present results at C = 6.4, however the results do not depend on 
this particular value of dot motion or its precise relationship firing rate. 

In our model, we assume that each population consists of N neurons firing spikes via a homoge- 
nous Poisson process, with rate X p or X n . We use the notation Xk(t) to each spike train. Integrating 
these processes over a time interval AT provides two time series of N— dimensional vectors of Pois- 
son random variables; these independent vectors provide the input to the decision making models. 
Specifically, for the k th neuron in a pool, on the i th time step, 

Ai+1)AT 

S\= x k (t)dt~Poiss{XAT). (3) 

JiAT 

The properties of Poisson processes imply that S\ is independent from (i ^ j), i.e. for different 
time steps. 

However, the outputs of different neurons in the same time are not, in general, independent. 
Following experimental observations that neurons with similar directional tuning tend to be corre- 
lated, while those with very different tuning are not |46U14| . we model neurons from different pools 
as independent and those within a single pool as correlated with a correlation coefficient p: 

- Cm ^ S ' 1 t + L (4) 



Var[4]Var[Sj 

This implies that, with vector notation for the probability distribution of spike counts for each 
pool, 

p[s;,si l } = p[s;]p[si l }. (5) 

Next, we introduce notation for decision making between the two task alternatives. The task of 
determining, e.g., direction in the moving dots task is that of determining which of the two pools 
fires spikes with the higher firing rate. We frame this as decision making between the hypotheses 

Hi : A P >A„ (6) 
Hq : X p < A re , (7) 

where each alternative corresponds to a decision as to the motion direction. This formalism allows 
us to define accuracy as the fraction of trials on which the correct hypothesis Hi is accepted. In 
this study we consider decision making tasks at a fixed level of difficulty, so that X p and X n do not 
vary from trial to trial (i.e., this hypothesis test is simple and not composite). 
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2.2 Accumulating spikes and evidence over time 

We relate the decision making task to a discrete random walk, which follows in turn from the sequen- 
tial accumulation of independent and identically distributed (IID) realizations from the sampling 
distribution Wj. We will specify this distribution below; for now, we note that the random walk 
takes the general form: 

£o = (8) 

E n+1 = E n + W n , (9) 

In a drift-diffusion model of decision making, accumulation continues as long as \E n \ < 9, the 
decision threshold \33\ [2T| [6] . The number of increments necessary to cross one of the two increments 
multiplied by its duration AT defines the decision time; this is a random variable, as it varies from 
trial to trial. Crossing the threshold corresponding to H\ is interpreted as a correct trial; the 
fraction of correct (FC) trials defines the accuracy of a the model. Together, the expected (mean) 
decision time (DT) and accuracy (FC) determine the performance of a decision making model. 

Formulas for the mean decision time and accuracy are given in Wald [41] as a function of the 
sampling distribution and the decision threshold. Importantly, these formulas are exact under the 
assumption that the final increment in E n does not overshoot the threshold, a point we return to 
below. Given the moment generating function for the sampling distribution: 



s) = E[e Ws ], (10) 



Speed and accuracy are given by: 



p C ~ : Ks ( n ) 
1 + e h ° d 

8AT f-h e\ , . 

DT « — — tanh — (12) 

E[W] V 2 J V ' 

where ho is the nontrivial root of <j>(s) — 1, i.e. 

<f>(h ) = l, ho^O. (13) 

We notice here that as 6 increases (and assuming ho < 0), both FC and RT will increase. 

We now return to the definition of the random increments W{. We consider two different ways 
in which this can be done. First, in the spike integration (SI) model, increments are constructed 
by counting the spikes emitted in a AT window by the preferred pool, and subtracting the number 
emitted by the null pool. This is equivalent to the time evolution of a neural integrator model 
that receives spikes as impulses with opposite signs from the preferred and null populations. This 
integrate-to-bound model is an analog of drift-diffusion model (DDM) with inputs that are not 
"white noise", but rather Poisson spikes: 

N N 

^ = E 5 L-E4n (14) 

k=l k=l 

[331 El SH S], cf. [28]. 

Second, in the Sequential Probability Ratio Test (SPRT), the increment is defined as the log- 
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odds ratio of observing the spike count from both of the pools, under each of the two competing 
hypothesis: 



Wi = log 



P[S;\H ]P[Si l \H ]_ 



2.3 The case of independent neurons 



(15) 



present an analysis of speed and accuracy of decision making based on independent neural 
pools; for completeness, and to help contrast this result with the correlated case, we give the key 
calculations in Appendices IA.ll IB. 11 Here, choosing increments via the SPRT yields: 

ho = -1 (16) 
E[W] = ATN(X p -X n )log(^-j. (17) 

Under the spike integration model, Zhang and Bogacz [35] (See also Appendix IB.lj) find that: 

h = -log(^). (18) 
E[W] = ATN (X p - X n ) . (19) 



Therefore, by applying a change of variables 8 — > 9 log yjr) m Equations ITT1 and \12\ spike inte- 
gration can implement the SPRT. The implication is that simply counting spikes, positive for one 
pool and negative for the other, can implement statistically optimal decisions for when the neural 
pools are independent 



2.4 Correlated neural populations: the additive and subtractive models 

We next describe two models for introducing correlations into the Poisson spike trains of each neural 
population. Both models are studied in \25\ 139] . and rely on shared input from a single correlating 
process to generate the correlations in each pool. These authors termed the two model SIP and 
MIP for single- and multiple-interaction process; here we use the added descriptors "additive" and 
"subtractive." In both models, a realization of correlated spike trains that provide the input to the 
accumulation models is achieved via a common correlating train. 

Before describing the models in detail, we note that in this study, these models are statistical 
approaches chosen to illustrate a range of impacts that correlations can have on decision making 
(see also [23|, 131]). In contrast, in neurobiological networks, correlated spiking arises as through a 
complex interplay of many mechanisms, including recurrent connectivity and shared feedforward 
interactions (For example, [H EH EZ] ) • While beyond the scope of the present paper, avenues for 
bridging the gap between statistical and network-based models of correlations in the context of 
decision making are considered in the Discussion. 

The first case is the additive (SIP) model, in which the spike train for each neuron is generated 
as the sum of two homogenous Poisson point processes. The first Poisson train is generated with an 
overall firing rate of (1 — p)X, where A is the intended firing rate of the neuron, and p is the intended 
pairwise spike count correlation between any two neurons in the pool. The second train, with a 
rate of pX, is added to every neuron in the pool, and serves as the common source of correlations. 
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An example of this model of spike train generation is depicted in the rastergrams in Figure Q]A. and 
B; the common spike events are evident as shared spikes across the entire population. 

The second case is the subtractive (MIP) model, in which correlated spikes are generated 
through random, independent deletions from an original "mother" spike-train; we refer to this 
as the correlating spike train |25| . There is a separate correlating spike train for each of the two 
independent populations. In order to achieve an overall firing rate for the pool of A spikes per 
second, with a pairwise correlation p between any two individual neurons, the correlating train has 
a rate of X/p spikes per second. Then, for each neuron in the pool, a spike is included from this 
train IID with a probability of p. An example of this model of spike train generation is depicted in 
the rastergrams in Figure [Tp and E. 

In summary, the two models both include correlated spike events that originate in from a 
single "mother." Although they produce identical correlations among cell pairs, these events are 
distributed in different ways across the entire population. We note that the results of [H] can be 
seen as a limiting case as p — > of either the additive (SIP) or subtractive (MIP) models. 

3 Subtractive (MIP) correlations and decision making performance 
3.1 The SPRT decision making model 

We now study the impact of subtractive (MIP) correlations on decision making performance. As 
noted above, recall that within a time window AT, the spike counts from each neuron form a vector 
of random variables which are independent from window to window. These independent vectors 
provide the evidence for each of the two alternatives, which is then weighed via log-likelihood at 
each step in SPRT. In Appendices IA.1I and IA.4[ we compute the values ho and E[W] that define 
the speed and accuracy of the SPRT (see Equations 1 1 llfT2j ) . for two pools with subtractive (MIP) 
correlations. As this computation is done in continuous time, it is natural to take AT — > 0; doing 
so, we find: 

ho = -1 (20) 

1 _ (1 — n\ N A 

E[W] = { - ^(A p - A n ) log AT + 0(AT 2 ) (21) 

P A n 

Comparing these values against those of the independent SPRT given in Equations [16] and [171 we 
see that the only effect of correlations is a scaling of the expected increment via (l — (1 — p) ) /p. 
In the limit as p — > 0, this scale factor approaches N, which in turn reduces decision time (the 
scale factor is inversely proportional to DT via Equation I12p. On the other hand, as p —¥ 1, the 
scale factor itself approaches 1; this agrees with the intuition that as all neurons become perfectly 
redundant, the performance should resemble that of a single neurons. In fact, the mechanism of 
the SPRT on a given sample can be seen as inferring the firing rate of the correlating train from a 
derived vector of noisy random variables. As N gets large, then, performance should be limited by 
performing an SPRT on the correlating "mother" trains themselves. This is precisely what happens 
when N — > oo in Equation I2T1 we obtain E[W] ~ ^(A p — A n ) log ^-AT, corresponding to decision 
making based on mother spikes of rate X p /p and X n /p. 

One consequence of this interpretation is that the particular realization of a spike vector (in a 
sufficiently small time-bin AT) carries no evidence about the decision of H\ vs. Hq, beyond its 
identity as either the zero vector or not. Of course, this is a consequence of the construction 
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Figure 1: Spike integration (SI) and SPRT for a single trial, with subtractive (MIP) correlations 
(A,B,C) and additive (SIP) correlations (D,E,F). Rastergrams at C= 6.4 for preferred (A,D) and 
null (B,E) populations of 5 neurons, with spike count correlation within pools p = .15. In (C,F) 
these spikes are either integrated (black line) or provide input for the SPRT (gray line), until a 
decision threshold is reached. The decision threshold has been set so that all four cases will yield the 
same mean reaction time (In C, 6si = 15 and 9sprt = 1-28, and in F 6si = 14 and Osprt = 1-28; 
in both cases the SPRT lines have been scaled for plotting purposes). On these trials, the SPRT 
accumulator crosses the "correct", upper, threshold, as opposed to the "incorrect", lower, threshold 
for the spike integrator. Unlike the independent case, the time evolution of the spike integration 
process is not simply a scaled version of the SPRT (though they are clearly similar) under either 
model of correlations. 
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of the MIP model, as the spike deletions that create the realization of the spike vector have no 
dependence on the firing rate of the population. Concretely then, the increments (or decrements) 
are based solely on whether the vector of spikes in the preferred (or null) pool contains any spikes 
at all; the actual number of spikes is irrelevant in the SPRT. 

It follows that the accumulation process E n is a discrete-space random walk, with steps ± log(A p /A n ). 
To see this, note that for sufficiently small AT, there are only three possibilities for how spikes 
will be emitted from the two populations. First, both the preferred and null pools could produce 
no spikes. This event provides no information to distinguish the firing rates of the pools, so the 
increment is 0. Second, one of the pools could produce a vector of spikes caused by IID deletions 
from the "mother" spike train. If the spiking pool is the preferred one, each possible nonzero spike 
vector will increment the accumulator by the log of the ratio (X p / p)/(X n / p); the opposite sign 
occurs if the null pool spikes. Events in which both pools spike are of higher order in AT, and thus 
become negligible for small time windows. 

The discrete nature of the SPRT effect causes the FC curve in Figure [2jA) to take on only 
discrete values of accuracy; a small increase in 9 above a multiple of log(A p /A n ) will not improve 
accuracy because E n on the final, threshold-crossing-step will overshoot the threshold. This also 
explains why some of the FC values at a given 9 do not lie on the theoretical line defined by 
Equation [TTJ that equation is only exactly true in the case of zero overshoot past the threshold. 
We will return to this point later, and also in Appendix O 

We next insert the values for ho and i?[W] computed above into Equations 1111 and 1121 and plot 
the resulting speed-accuracy curves relating DT and FC parametrically in the threshold 9 (Figure 
EJB)). (We plot the full FC and RT functions, although only discrete values of performance along 
each of the lines are achievable in practice, as indicated by the dots for the p = .15 case; see caption). 
By comparing speed-accuracy curves for different values of p ranging from to 0.3, we see our first 
main result: introducing MIP correlations within neural populations substantially diminishes the 
best-possible decision performance, that obtained via the SPRT. We will next derive the analogous 
results for the simpler spike integration model. 

3.2 The spike integration decision making model 

Next, we consider decision making performance for the simpler model in which spikes are sim- 
ply integrated over time, as opposed to the likelihood ratio computation of the SPRT. In this 
case, the moment generating function of the difference in spike counts from the two pools is more 
straightforward (See Appendix IB.3|) . and provides an easy computation of i?[VF]: 

E[W] = ATN (A p - A n ) (22) 

The nontrivial root of the MGF ho is found to be the implicit solution of: 

(((1 + p(e* - 1))" - 1)) A p + (((1 + pie-' - l)) N - 1)) X n = (23) 

Here we see that correlations only impact the performance of the model through changing ho, as the 
expected increment is the same is in the independent case (Equation I19p . Moreover, performance 
under spike integration is diminished to a degree that is comparable to the performance loss of 
SPRT. To illustrate this, Figure [3T21 A plots the speed-accuracy tradeoff curves from both models of 
decision making under subtractive correlations, for the same values of p. As we must( |42|). we see 
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Decision Threshold (0) Decision Time (DT), ms 

Figure 2: Subtractive (MIP) correlations significantly diminish decision performance under SPRT 
(C= 6.4, N = 240). (A) The discrete nature of the SPRT diffusion process implies that only discrete 
values of accuracy are possible. These occur at values of 9 that are multiples of log(A p /A n ) « .1.28. 
(Similar results hold for decision time, not shown.) The solid dots are simulations of the SPRT, 
and gray dots are exact values taken at multiples of the log ratio; the interpolating line is Equation 
[TTJ (B) Accuracy (Equation [TTj) and decision time (Equation [T2]) are plotted parametrically as 
a function of threshold, for 8 different values of p (linearly spaced on [0,.35] with the a double- 
thickness line at p = 0). Performance of the simulation at multiples of the log-ratio of firing rates 
are plotted as solid dots, and theoretical values in gray (gray dots are enlarged to be distinguished). 

the optimal character of the SPRT in the fact that at a given level of accuracy, the SPRT requires, 
on average, fewer samples than spike integration. However, the difference is very slight. This yields 
our next main result, that nearly optimal decisions are produced by the simple operation of linear 
integration over time for the MIP model of spike correlations across neural populations. 

Having established this, we pause to note a subtlety in our analysis. Figures 13. 2B and C show 
FC and DT as a function 9, for both simulated data and plots of Equations 1111 and I12[ The solid 
lines are the graphs of those equations as written (using the values for ho and £7[W] in Equations 
[23l and [22]) , and the mismatch between the lines and the data are a consequence of overshoot past 
the threshold. The broken line is a graph of the same formulas, with a shift in — > 9 + 14.5, 
an offset computed as the sample mean of the overshoot distribution (See Figure [6] as well as the 
discussion in Appendix [Cj also [20} 127]). This correction term helps the FC and DT equations 
better approximate the data when there is potential overshoot. Interestingly, however, parametric 
plots like Figure [3721 A already take this effect into account. 

4 Additive (SIP) correlations and decision making performance 
4.1 The SPRT decision making model 

As described in Section \2A\ the additive (SIP) model of spike train correlations also utilizes a com- 
mon spike train to generate correlations, but does so in a manner that gives a distinct population- 
wide correlation structure. We now derive the consequences for decision making performance under 
the SPRT. In Appendices I A. II and IA.31 we find the expressions for the parameters of the FC and 
DT curves, as the window size AT — > 0: 

/*) = -! (24) 
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Figure 3: For the subtractive (MIP) model of spike correlations, decision making performance of 
the spike integration model is comparable to the SPRT, and is well described by Equations 1111 and 
[T2l despite overshoot past past the decision threshold. (A) Gray lines are reproductions of speed 
accuracy curves from the SPRT (Figure [2]), and black lines are speed accuracy curves for spike 
integration. (B,C) Overshoot past the decision boundaries reduces the validity of Wald's approxi- 
mations, but a constant shift in threshold can help mitigate the effect (See [20\ [27] and Appendix 
IO) . Such a shift is automatically accounted for when comparing curves that are parametric in 9 
(Panel A, for example). 
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Decision Threshold (0) Decision Time (DT), ms 

Figure 4: Additive (SIP) correlations do not significantly diminish decision performance under the 
SPRT. (A) The discrete diffusion with increment ±log(A p /A n ) ~ .128 gives the same accuracy as 
the subtractive (MIP) correlations case (Figure [2]A.) at each value of 9. Because of the absence 
of overshoot, the FC and DT relationships can be applied exactly. (B) However, the resulting 
speed-accuracy curves are very different. In particular the impact of correlations on the speed- 
accuracy tradeoff is much smaller than for subtractive correlations (cf. Figure [2j3, noting that here 
the abscissa ranges up to 30 ms, in contrast to 800 ms). Here only p = 0, 0.15, and 0.3 are plotted 
for clarity. 

E[W] = (N(l -p) + P ) (X P - A n ) log ^AT + 0(AT 2 ) (25) 

Comparing these with Equations 1161 and 1171 we see that, as in the subtractive (MIP) correlations 
model, the only difference with the independent case is a scaling factor on the average increment 
.E[W] in Equation [25j To explain the form of the scale factor, note that the spike vector from 
each pool is composed of N independent spike trains firing at rate A(l — p), and a single (highly 
redundant) spike train firing at a rate Ap. 

As in the subtractive (MIP) model, E n here also becomes a discrete random walk with increment 
±log(A p /A n ). This can be seen by noting that for either pool, in a sufficiently small AT window, 
only one of two events is possible: (i) no spikes occur at all, or (ii) a single spike occurs in one 
neuron, in one of the two pools. The first case is uninformative about either H\ or H$. The 
second case occurs with probability A(l — p) under H± and A(l — p) under Hq (Here A = A p if the 
spike occurred in the preferred pool, for example); taking the log ratio, we find our increment is 
independent of correlations. The resulting decision accuracy (FC) is plotted vs. threshold in Figure 
0A, and is qualitatively similar to the subtractive (MIP) correlations case, with plateaus following 
from the discrete nature of E n . However, the speed-accuracy tradeoff pictured in Figure [4(3 is very 
different from that found in the subtractive (MIP) model. 

In particular, we see our third main result: the impact of additive correlations on optimal 
(SPRT) decision performance is relatively minor. For example, in the presence of pairwise corre- 
lations as strong as p = .3, the mean decision time required to reach a typical value of accuracy is 
increased by only a few milliseconds compared with the independent case, instead of by hundreds of 
milliseconds as for subtractive correlations. Equation 1251 offers an intuitive explanation for this fact: 
.E7[W] is inversely proportional to DT, and does not diminish nearly as fast for SIP correlations 
than MIP correlations (cf. Equation I21[) . 
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Decision Threshold (6) Decision Time (DT), ms 

Figure 5: Decision making performance of the spiking integrator model with additive (SIP) corre- 
lations is comparable to subtractive correlations: correlations significantly decrease performance. 
(A) Black lines give the speed accuracy tradeoff predicted using ho and i£[W] from Equations [26] 
and [271 (and thereby assuming no overshoot of the decision threshold). Performance is similar to 
the subtractive-correlations case (broken gray lines), and significantly worse than performing SPRT 
on additive-correlated inputs (solid gray lines). (B) At p = .15, for example, major differences arise 
between this theory (again, solid black line, reproduced from A) and simulation of the model (dots), 
especially at short reaction times. This is a consequence of significant overshoot of E n over the 
decision threshold, on the threshold crossing step. (Inset) At short reaction times, the simulations 
actually perform closer to the SPRT (gray line, reproduced from Figure IDA.); see text. 

4.2 The spike integration decision making model 

What about the ability of the simple spike integrator to perform decision making when confronted 
with additive correlations? Proceeding as in the subtractive-correlations case, we derive an implicit 
relationship for ho, and the expected increment i£[W]: 

\ p (p(e m ~ 1) + (1 " PWe* - 1)) + X n (p(e- Nt - 1) + (1 - p)N(e- t - 1)) = <=> h = t (26) 

E[W] = ATN (X p - A n ) (27) 

By comparing with (Equation I19p . we see that, as for spike integration in the subtractive (MIP) 
case, correlation affects only the value of ho and not the expected increment. Substituting these 
values into Equations [TTJ and \12\ we then plot the speed-accuracy tradeoff curves for this model 
under the assumption of no overshoot in Figure OA. It appears that, when decisions are made via 
spike integration, correlations impact performance quite significantly (black lines), in contrast to the 
SPRT case (solid gray lines, reproduced from Figure [4)3) . Overall, the degree of performance loss is 
comparable to that under subtractive correlations (broken gray lines, reproduced from Figure EOB ). 
This is our fourth main result: for additive correlations, if decisions are made via spike integration 
instead of the SPRT, correlations have a significant impact on reducing decision performance. 

However, the assumption that integrated spikes do not overshoot the decision threshold might 
seem suspect under the additive model of correlations, as there is a possibility that the threshold 
crossing step might occur as a result of every neuron in a pool simultaneously spiking at once. 
In fact, when the number of neurons in the pool is large (as in the cases we consider), additive 
correlations can indeed cause significant overshooting of thresholds; importantly, and unlike for 
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Figure 6: Overshoot distributions for spike integration under additive (SIP) and subtractive (MIP) 
correlations. The random variable X indicates the distribution of E n — 9 conditioned on crossing 
the upper threshold (similar results for the lower threshold are not shown). The probability mass 
function (PMF) of X varies as a function of 0, and two vertical slices through this density are 
shown at 9 = 50 and 250. Here the overshoot distributions are discrete, due to the integral nature 
of the increment distribution. For plotting purposes, the vertical axis has been split in the SIP case, 
to allow plotting of the outlier point at zero. The black line indicates E\X\ as 9 varies; crucially, 
this quantity varies significantly and for higher values of 9 under SIP correlations, resulting in the 
non-monotonic speed-accuracy tradeoff pictured in Figure [5j 



subtractive (MIP) correlations, this effect cannot be compensated via a constant offset in the 
decision threshold. 

Figure [5j3 demonstrates the consequences for the speed-accuracy tradeoff. Here, when the spike 
integration model is simulated directly, we see a surprising non-monotonic relationship between FC 
and DT in the presence of additive correlations of strength p = .15. This violates the usual intuition 
of that accuracy should increase at slower decision speeds. The explanation comes from the fact 
that, as the decision threshold is raised increases, DT correspondingly increases while accuracy 
suffers - a consequence of not finishing a trial before a (relatively rare) spike in a correlating spike 
train in one of the two pools causes the accumulator to jump far beyond the threshold. 

For large thresholds, the sequential sampling theory of Equations [IT] and PT2T which assume no 
overshoot, accurately approximates the simulated data; however for low values of 9 the approx- 
imation is poor. In fact, the inset to Figure 03 shows that in this regime, the decision making 
performance of the spike integration model is far better described by the theory predicted by the 
SPRT. The intuition behind this observation is that for short reaction times, there is a small prob- 
ability of a shared spike that will send the integrator significantly over the threshold. This allows 
accumulation to occur one spike at a time (for sufficiently small AT), where each spike arrives 
from an independent spike train. As we have seen, the process of integrating independent spikes is 
equivalent to the SPRT. It is only at longer decision times, when the chances of having integrated 
a large common spike event are larger, that a significant impact of correlations appears. 

Figure [7] provides further evidence for this scenario. Density plots of the distribution of the 
overshoot X = E x — 9 (conditioned on crossing the upper threshold), for both additive (SIP) and 
subtractive (MIP) correlations are shown as a function of the decision threshold, with particular 
overshoot distributions plotted at 9 = 50 and 250. For the additive correlations model, a significant 
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Figure 7: Increments for the SPRT are nonlinear when input spikes are correlated. (A) For both 
additive and subtr active correlations, the spike integration model of decision making implies a lin- 
ear mapping between the number of spikes in the preferred and null populations, and the increment 
to the accumulator. (B) With subtractive correlations, a severe nonlinearity means that only incre- 
ments of ±log(A p /A n ) occur. This stands in direct contrast to the optimality of linear summation 
in the zero-correlations case. (C) A nonlinear computation also appears as a consequence of the 
additive correlations model, however the nonlinearity is much less severe than in the subtractive 
model. (All results pictured hold in the case of vanishing AT.) 

fraction of the trials terminate with zero overshoot at low values of 9 (because, for example, large 
correlating events are relatively rare), implying that many trials underwent optimal accumulation 
of evidence, without experiencing a common, correlating spike event as discussed above. 

Overall, the monotonic dependence of accuracy (FC) on decision time (DT) follows from the 
invariance of the moments of the overshoot distribution relative to changes in the threshold value 
6; this is particularly true for the first moment (See Appendix[C|). Figure[7^SIP) demonstrates that 
these moments continue to fluctuate over a larger range of 6, and with larger magnitude, for the 
additive correlations model. This serves to explain the strange shape of the speed-accuracy tradeoff 
curve pictured in Figure 03 that (unlike the subtractive correlations model) cannot be explained 
by a constant shift in 6. 

5 Nonlinear computations and optimal performance via the SPRT 

When the neurons in each pool spike independently, Zhang and Bogacz |45] demonstrated that 
linear summation of spikes across the two pools at each time step implements the SPRT. Because 
the SPRT is optimal in the sense of minimizing DT for a prescribed level of FC, the conclusion is 
that linear integration of spikes across pools, and then across time, provides an optimal decision 
making strategy. However, is this optimality of linear integration confined to the case of independent 
activity within the pool? 

Above, we showed that when correlations are introduced into this model, it is no longer true 
that each spike should be given the same "weight", as in linear integration. Moreover, knowing 
only the pairwise correlations and firing rates alone does not allow one to write down a rule for 
the function that should be applied to incoming spikes in order to implement the SPRT, although 
in these cases this function takes the form of the difference between the result of a nonlinearity 
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Figure 8: Optimal performance via spike integration under additive correlations can be realized 
with a simple nonlinearity. (A) A nonlinearity discounts the contribution to the accumulator of a 
shared spike event (See Equation I30|) . (B) Spike integration with this nonlinearity is suggested by 
Figure [TP, and recovers performance of the decision making model (black dots) to agreement with 
the results of SPRT (gray line). Without this nonlinearity to discount shared events, performance 
suffers (gray dots, reproduced from Figure EB, Inset) 

applied to both pools. This dependence on higher order statistics is demonstrated in Figure [7] by 
the fact that the nonlinearities for MIP correlations (Panel B) and SIP correlations (Panel C) take 
a significantly different form. 

For MIP correlations, the nonlinearity pictured in Figure [7£> that implements the SPRT (up to 
a change in threshold) takes the form: 



At first glance, it is surprising that such a severe nonlinearity, applied to two MlP-correlated 
spiking pools, results in nearly the same performance is simple spike integration (c.f. Figure l3~2l) . 
The intuition here is that optimal inference requires essentially performing spike integration on the 
correlating spike train, as no information about the firing rate is added through spike deletions. 
This random walk on one of three cases (-1,0, or +1) is approximated by linear integration, in the 
limit as the size of the pool (N) increases. 

Another perspective on the nonlinearities that enable optimal computation is that they leverage 
knowledge about the mechanism of correlations, to improve performance. In the SIP model, the 
nonlinear function depicted in Figure [7p is, as in the MIP case, a consequence of applying a 
nonlinearity to each pool, and then subtracting. However, in this case, the form is not as drastic — 
a shared spike event coming from the correlating train only registers as a single spike: 



Wi = fMip(S l p ) - fMip(S l n ) 



(28) 




(29) 



Wi = fsip(S' l p ) - fsip(S z n ) 



(30) 
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Figure 9: The joint cumulants of the SIP and MIP processes differ for pools of greater than 
two neurons. Under the additive (SIP) model, the joint cumulants of the spike counts from AT 
neurons is constant for all N > 2. In contrast, the joint cumulants of the subtractive (MIP) 
model decay geometrically as the pool size increases, and this difference helps to characterized the 
differences in higher-order correlations between the two models. (See Appendix[D]for supplementary 
computations.) 



Intuitively, this strategy uses the fact the a simultaneous spike in every neuron in a pool only has 
one explanation for a sufficiently small window of integration, and therefore uses the correlating 
spike train as an additional independent input in the likelihood ratio. At low values of 6, this does 
not confer much of an advantage; however as the threshold increases, higher accuracy is achievable 
at much shorter decision times. The nonlinearity is pictured Figure [HK, and also offers an intuition 
as to why, for low threshold values, spike integration performs almost optimally: when spikes from 
the correlating train are rare (or can be properly weighted), spike integration implements SPRT 
(Figure EB). 

6 Discussion 

Correlated spiking among the neurons that encode sensory evidence appear ubiquitous. Such 
correlations might arise arise from any number of neuroanatomical features - the simplest being 
overlapping feedforward connectivity which can cause collective fluctuations across a population [5j 
l36 | 1161128] , They can also result from sensory events that impact an entire population, or from rapid 
modulatory effects. Moreover, for large neural populations it appears that accurate descriptions of 
population-wide activity can require more than the typically measured pairwise correlations, but 
higher order interactions as well \29\ \19\ WQ . 

The aim of our study is to improve our understanding of how correlated activity in these popula- 
tions can impact the speed and accuracy of decisions that require accumulating sensory information 
over time. Faced with the wide range of possible mechanisms and structures of correlations alluded 
to above, we choose to focus on two models for population-wide correlations that illustrate a key 
distinction in how correlations can occur. These models have identical first-order and pairwise 
statistics, but differ in how each common spiking events either involves a small subset of the neu- 




(31) 
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rons (the subtractive, MIP case) or each neuron in the pool (the additive, SIP case) [251 [35]. 

Figure [9] quantifies this difference: based on calculations in Appendix Q21 we plot the joint 
cumulant across k neurons in a pool under both subtractive (MIP) and additive (SIP) correlations. 
While the additive model possesses a constant joint cumulant no matter how many neurons are 
included, the joint cumulant of k neurons falls off geometrically for the subtractive case. We 
conjecture that this is a statistical signature that could suggest when other, more general patterns of 
correlated activity - measured experimentally or arising in mechanistic models of neural circuits [28] 
- will produce similar effects on decisions. Exploring this conjecture via models and data is a target 
of our future research. 

We summarize our main findings are as follows. For both models of correlated spiking, decisions 
produced by a simple, linear spike integration model (i.e., a neural integrator) become slower and 
less accurate as correlations increase. However, a strong difference appears for decisions made 
via the optimal decision strategy (SPRT). Here, additive correlations have only a minor impact on 
decision performance, while subtractive correlations continue to strongly diminish this performance. 
The conclusion is that decision making circuits, faced with subtractive (MIP) correlated sensory 
populations, will invariably produce diminished decision performance, and stand little to gain by 
implementing computations more complex that a simple integration of spikes over time and neurons. 
However, in the presence of additive (SIP) correlations, circuit mechanisms that implement or 
approximate the SPRT - perhaps via a nonlinearity such as that shown in Fig. [8j applied to the 
sum of incoming spikes - stand to produce substantially better decision performance than their 
linear counterparts. 

In other contexts, nonlinear computations have also been shown to improve discrimination be- 
tween two alternatives. Field and Rieke [17\ [T8] demonstrated the importance of a thresholding 
nonlinearity in pooling the responses of rod cells, where this nonlinearity served to reject "back- 
ground" noise. Closer to the present setting, gating inhibition that prevents accumulation of noise 
samples before the onset of evidence-encoding stimulus can account for visual search performance 
[32], and recent results suggest that related nonlinearities can improve performance for mistuned 
neural integrators ([H], see also [12]). 

Our cases in which correlations decrease performance - in particular, when spikes are linearly 
integrated - are consistent with several prior studies of the role of correlated activity in decision 
making [461 [HJ [15]. We note, however, two differences in our models. The first is the mechanism 
through which correlated spikes are generated; while we use additive and subtractive models based 
on Poisson processes, the authors of [8], [15] use a multivariate Gaussian description of spike counts. 
The second is that in [HI [15], decisions are rendered after a duration that is fixed before the trial 
begins (either a single duration, [8J, or one that is drawn from a distribution of reaction times, 
in [H]). This is different from the setting here, where incoming signal on each trial determines the 
reaction time through a bound crossing. 

Our result, in the case of subtractive (MIP) correlations, that linear integration of spikes closely 
approximates the optimal decision making strategy is similar to findings of Beck et al. [4]. Specif- 
ically, they model a dense range of differently tuned populations, and find that optimal Bayesian 
inference can be based on linear integration of inputs, for a wide set of correlation models. Our 
additive (SIP) case, however, behaves differently, as nonlinearities are needed to achieve the optimal 
strategy. 

An aim of future work is extending the setting of our study to include tuning curves as in 
HI |4"5[ [T5] , This is more realistic for many decision tasks (including the direction discrimination 
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task), and will also allow progress toward models with multiple decision alternatives. An important 
challenge will come from defining pairwise correlations that vary as a function of preferred tuning 
orientation (see \46\ Q3] ) , while also including the full structure of correlating events across multiple 
cells in a realistic way. For example, in the present paper, additive correlating events occurred 
independently in the two populations; future work could take a more graded approach, in which 
some events impact the entire sensory population (i.e., as in an eyeblink or possibly an attentional 
shift during a visual task). 

As long as each neuron remains modeled as a Poisson point process, the sequential accumulation 
theory utilized here will carry over directly. This points to another limitation of the present study 
and opportunity for future work. This is the lack of temporal correlations in the statistics of the 
inputs. A model of correlations that includes spikes from a correlating train that are temporally 
jittered [23\ [39] could provide a starting place for a model of the input trains, however defining 
updates to the likelihood ratio for the two competing hypotheses will be more difficult. Nevertheless, 
it will be interesting to see how our results carry over; in particular, there will be many more different 
combinations of spike events that will contribute to increments for both spike integration and SPRT 
decision models. 

While we therefore view the present study as a first step in exploring many possibilities, our find- 
ings demonstrate how the population-wide structure of correlations - beyond pairwise correlation 
coefficients - can strongly impact the speed and accuracy of decisions, and the circuit operations 
necessary to achieve optimal performance. This suggests that multi-electrode and imaging tech- 
nologies, together with theoretical work on neural coding, will continue to play an exciting role in 
understanding the structure of basic computations like decision making over time. 
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A Sequential Probability Ratio Test 

A.l Nontrivial root of the moment generating function (SPRT) 

The nontrivial real root of the moment generating function (MGF) of a sampling distribution 
is critical to finding FC and DT of an independently sampled sequential hypothesis test (via 
Equations 1111 and \12\i . For the SPRT, the increment distribution is given in Equation 1151 as: 



Wi = log 



pis;, s* r 



I -Hi 



P[S l p , si 



\Hn 



(32) 



The "correct" hypothesis H\ is in the numerator in order to orient a crossing of the positive decision 
threshold with a correct choice. Correspondingly, the probability of observing a given sample S^S^ 
is known from assumption of this hypothesis, and by definition follows the distribution: 



P[S;,Si l \H 1 } = P[S i p \H 1 ]P[S i n \H 1 } 



(33) 
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where the independence assumption of the spike count vectors from the two separate pools S p and 
have allowed the factoring of the distribution. Dropping the sampling index i for notational 
convenience, the MGF can then be computed as: 

*, w _ = E pwmr^m ( pi;:is:i*:i y <34> 

The nontrivial root (ho 7^ 0) can then be seen by inspection (cf. Equation 1 161) : 

s = ho = -1 (f>w{ho) = 1 (35) 

We note that this computation is fully general, without any assumptions on the structure of corre- 
lations both within and across pools. 



A. 2 E[w], Independent interactions (SPRT) 

The other parameter of the sampling distribution critical to computing the FC and DT functions, 
-E[W], is computed for independent spike count distributions (p = 0, cf. [T7|) as follows (see also 
051): 

E[W] = Y, P[«p|fli]P[an|fli]log 

Sp i&n 

= P[«p|fli]P[«»|fli] (logP[s p |Pa] +\ogP[s n \H 1 } - log P[s p \H ] - log P[s n \H }) (37) 



PjsplH^Pjs^} 
P[s p \H }P[s n \H ] 



(36) 



P[s„|#i] log P[s p |#i] + P[s n |i?i] log P[a n |£Ti] 
- P[a„|fli] log P[ Sp |tf ] - P[«t»|#i] log P[a n |ff ] 



TV E 



i„ g «l| ffl 



+ P 



log p[^o! |ifl 



P[S p |tf 

iVAT (^A n - A p + A p log ^ + A p - A„ + A n log 



NAT(X P — X n ) log — ^ 

An 



(38) 
(39) 

(40) 
(41) 
(42) 



When this quantity is substituted into Equation [T2l the AT will cancel off, implying that DT is 
not a function of the sampling increment size. We compute this quantity for correlated spike count 
distributions next. 



A.3 E[W], additively (SIP) correlated interactions (SPRT) 

When neurons within pools are correlated, the joint PDF of the spike count vector is no longer 
decomposable into the product of the marginal distributions (the critical step between Equations [39] 
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and l40p . However, an expression for J5[W] can be obtained in the limit as AT — > 0, by repeatedly 
expanding via Taylor series about AT = throughout the computation. 

First, we simplify the expression for the expected increment by using the independence of the 
two pools: 



E[W] = E 



lQ g M| gl 



+ E 



i P[S n \Hi] iTT 



(43) 



P[S P \H ] 

Next we expand each term to first order in AT; below, we only demonstrate the expansion for the 
"preferred" population; the calculation for the null pool follows by exchanging X p and A n . In that 
case, by using the Law of Total Expectation conditioned on the number of spikes in the common 
spike train "shared" across the pool S p (which spikes at a rate pX p AT), we have: 



E 



^Pj^Hoi^ 

oo 

= E p t4 



E 



E 



P[S P \H ] 



\Hi 



s P ]E 



s v =0 



log „r„ , „ ; \S„ 



P[S P \H ] 



s„, Hi 



= (l-pX p AT)E 
Taking the case of S p = 
E log 



log \S„ 



P[S P \H ] 



= 0, 

P[Sp\Hi] 
P[S P \H ] 



0,^i 



+ AT\ p E 



\ P\Sp\HA ,z 



\Sp — 0, H\ 



E P[ Sp ,4^0,^log™ 



(44) 
(45) 

+ 0(AT 2 ) (46) 
(47) 



The aim here is to take advantage of the conditioning; because the spike counts of neurons 
within the same pool are conditionally independent, given the number of spikes in the correlating 
spike train, the joint distribution across the vector s p becomes the product of the conditioned 
marginal distributions. However, this is only true for the first factor in the summand of Equation 
I4T1 To continue, we must expand the log-ratio of the probability distributions, using the law of 
total probability, in AT: 



P[s p \Hi] = ^P[§ p \Hi]P[s p \§ p = S p ,H 1 ] 



(48) 



(1 - p\ p AT)P[s p \S p = 0, Hi] + pX p ATP[s p \S p = 1, Hi] + 0(AT 2 



(49) 



P[s p \H ] = f2 P ^p\ H d P l s p\Sp = S P' H d 



(50) 



(1 - P X n AT)P\s p \S p = 0, H ] + P X n ATP[s p \S p = 1, H Q ] + 0(AT 2 



(51) 



Moreover, the N— term summation in Equation H7] need only be over € {0, 1}, as higher values 
will produce contributions of higher than first order in AT. Two cases emerge for the expansion: 
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if Si = for any i, P[s p \S p = 1, H\] = P[s p \S p = 1,Hq] = 0, and we have: 



. P[8 P \H X ) P[s p \S p = 0,H 1 ] 

log , = log ■ 



P[s p \H 



P[s p \S p = 0,H ] 



(52) 



On the other hand, if Sj = 1 for all i, we can compute the expression directly via total probability, 
as there are only four possible ways for the event to originate; to first-order in AT, this is: 



[ p 1 0J i=0 s p =0 

= p(A p + A„)AT + 0(AT N ) 



i i 



(53) 
(54) 



Therefore, this single element of the sum offers no order one contribution (it is multiplied by 
P[s p = 1\S P = 0, Hi] which is itself is 0{AT N )); thus, 



Y,P[a p \S p = 0,H 1 ]\og 



P[s p \S p = 0,H 1 ] 
P[s p \S p = 0,H ] 



N(l- p)AT 



X p + X p log^) 



The case of s p = 1 is simpler, as only zero-order terms must be kept (due to the coefficient in 
Equation I46p . Recycling the expansion from Equation [52] we have that to zero-order: 



E 



P[S P \H ] 

Finally, combining Equations I46 ( l55 | and I56( we have that: 
E 



£ P[s P \S P = 1, H X ] log = log ^ + O(AT) (56) 



i„ g »l| ffl 



P[S p \Hq 



A, 



(1 - pXpAT) ( N(l - p)AT ( A n - A p + X p log ^ ) + 0(AT 2 ) ) (57) 



+ ATpA p (log ^ + 0(AT 2 )) 



(58) 



Repeating the exercise for the other component of Equation [J3] amounts to exchanging "p" for "n"; 
adding everything together gives the final result, to first-order in AT: 



A, 



E[W] = (N(l -p) + p) AT(X P - A n ) log + 0(AT 2 ) 



A, 



(59) 



We note here that as p — > and p — > 1, we reproduce the results that would be expected from 
Equation [l2l Also, a more intuitive and tractable computation can be done for an analogous 
additively-correlated Bernoulli process, resulting in the same solution. 
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A. 4 ^[W], subtractive (MIP) correlations within pools (SPRT) 

In the case of subtractive correlations within pools, the derivation of i?[W] is the same as the 
additive correlation case, up to Equation USJ In this case, however, we now have: 



E\W) = (l-^AT)E 
P 



P[S p \Hi] q 



—ATE 

P 



+ 0{AT 2 ) 
(60) 

Taking the S p = case first, we notice that it is impossible for any spikes to occur without a spike 
in the correlating spike train: 



S p ^0 => P[S p \S p = 0,H 1 }=0 



Because of this, we can simplify: 
E 



I „ P [Sp\ H l\ I Q n TT 



P[S p \Hq 



P[0|S p = 0,#i]log 



P[0\Hi] 
P[0\H ] 



log 



P[0\Hi] 
P[0\H ] 



(61) 

(62) 
(63) 



Interestingly, after conditioning on the number of correlating spikes, the probability of the zero 
vector (or any vector s p ) is the same under both Hq and H±: 



P[0\S p = s p , H ] = P[0\S p = s p , 
We then expand to first-order in AT: 



(64) 



log 



P[0\Hi] 
P[0\H ] 



log 



1 _ ^ATJ + fATP[0\S p = 1] + 0(AT 2 ) 
1 _ ^at) + ^ATP[0\S p = 1] + <3(AT 2 ) 



= {Xp ~ Xn)AT ((l-p) N -1)+0(AT 2 ) 
P 

In the case of S p = 1, only zero-order terms must be computed. When computing 
E 



(65) 
(66) 



log 91^14 = 



P[S p \H ] 



(67) 



the summation only carries over {0, 1} for each element of s p . The case of s p = provides no 
contribution at zero-order, as can be seen by Equation 1661 for any other case, there will be a 
degeneracy in the expansion of the log, caused by an absence of order terms: 



s p ^0 



log 



P[s P \H ] 



log 



X f ~ ^AT) P[s p \S p = 1] + $P\sp\Sp = 2] + ... 
^ - ^AT) P[s p \S p = 1] + §zP[s p \S p = 2] + ... 



log + O(AT) 

An 



(69) 
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Therefore, to first-order in AT 
P[S P \H!' 



E 



log 



P[S p \Hq 



Sp — 1, Hi 



log -£ ^ P[ Sp |4 = 1, Fx] + O(AT) 
log^(l-(l-p) JV ) + 0(AT) 



(70) 
(71) 



Combining Equations l60l l63l l66l andlTTl we find that: 

(l-(l-p)^) ( An -A p + A p log^ 



E 



log 



P[S P \H 



P[Sp\Ho 



AT 



(72) 



As before, exchanging "p" for "n" takes care of the expression for the null pool, and adding together 
gives: 

E[W] = (1 - {l - P)N \ \ P - X n ) log ^AT + 0(AT 2 ) 



(73) 



Once again, as p — > and p — >■ 1, we reproduce the results that would be expected from Equation 



B Spike integration 

B.l Independent spiking (SI) 

Computing FC and DT for the spike integration accumulation model relies on computation of the 
MGF for the sampling distribution. We begin with several identities that will be useful below. The 
MGF for the sum of N independent random variables is: 

N 

S = ^2S 1 ^ ^(t) = (t> Si {t) N (74) 

i=l 

Given that the MGF for a random variable S = 4>sif)i it follows that 

4>- S {t) = 4> s (-t) (75) 

Finally, the MGF for a Poisson random variable is: 

0(t) = e^'" 1 ) (76) 

Given the definition of the increment variable in Equation [T4~l and noting that each spike count 
random variable is independent, we can combine these observations to construct the MGF for the 
sampling random variable, over a time window AT: 

cf> w (t) = (^^'-iJ^^^-iJf (77) 
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Now the nontrivial root can be calculated (cf. Equation [18]) : 

h Q = - log (&\ =^ <MM = 1 (78) 

Because the MGF is known explicitly, the computation of the expected increment is simple (cf. 
Equation [T9|) : 

E[W] = 4>' w (0) = ATN (A p - A re ) . (79) 
B.2 Additive (SIP) correlated interactions within pools (SI) 

When additive correlations are introduced within pools, the spike count distribution MGF over 
a time period AT can still be broken into the product of two separate MGF's, one each for the 
preferred and null pools, which are identical in form but differ in their Poisson rate parameters 
(indicated by the semicolon): 

4>w(t) = 4>s(t; X p AT)^s(-t; A„AT) (80) 

For the preferred pool, the spike count can be broken into two independent contributions — spikes 
S from the shared (i.e. correlating) spike train that get counted N times (firing at a rate pX), and 
spikes from the N independent spike trains that get counted once (each firing at a rate (1 — p)X): 

( t> s (t;X) = ( j) § (t;X)<t> Si (t;X) N (81) 

The MGF for the shared spike train can be computed directly from the definition, using its prob- 
ability mass function (PMF): 

P[S = iN] = l ^iT~ 1 G N °, (82) 
otherwise, 



and thus: 

oo oo —X\k 

, s (t,X) = J2e*P[S = k] = £e^^-A = e (e--DA (83) 

fc=0 fc=0 



The MGF for the independent spike trains 4>Si(t,X) follows from Section lB.ll giving the form of 
the MGF of the increment over a time AT as: 



<j) W (t) = e (e Nt -l) P XpAT e (e- m -l)pX n AT ^ e (e*-l)(l-p)A p AT^j N / e ( e -*-l)(l-p)A„AlA N ^ 

After rearranging, Hq is implicitly defined as the nontrivial root of: 

X p (p(e Nt " 1) + (1 " P)N{e l - 1)) + X n (p(e- Nt - 1) + (1 - p)N(e~ t - 1)) = => h = t (85) 

As p — > 0, we recover the solution from Section IF3.1L The expected increment can be directly 
computed as: 

E[W] = ATN (A p - A n ) (86) 

Note that this last expression is the same as in the independent case (Equation 119ft . as expected, 
and that unlike the SPRT, no limits in AT were necessary to compute the parameters for the FC 
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and DT functions. 



B.3 Subtractive (MIP) correlated interactions within pools (SI) 

With subtractive correlations, we again derive an MGF for the spike count vector of an individual 
pool <^(t;A), and apply Equation [SUJ In this case, however, the number of spikes in a pool, 
conditioned on the number of spikes in that pools correlating train, is binomially distributed. Thus 
applying the Law of Total Probability: 

oo . 

P[S = s] = ^ Poissfs; -]Binom[siV, s; p] (87) 
using the definitions for the PMF's of the Poissfi; A] and BinomfiV, k;p] distributions, we have: 

oo 

(f>s(t; A) = E[e st ] = J2 P i S = = e( ((1+ ^ et - 1)) "- 1) )^ (88) 

s=0 

After applying Equation [5D] with this MGF for both the preferred and null population, we find an 
implicit relationship for the non-trivial real root t = ho that does not depend on AT: 

(((1 + p{i - \)) N - 1)) A p + (((1 + pie-' - l)f -l))\ n = 0^h = t (89) 

As before, the expected increment can be directly computed by differentiation, and we find the 
same expression as in the additive correlation case: 

E[W] = ATN (A p - A n ) (90) 
C Speed and accuracy functions with overshoot 

The identities provided in Equations [11] and [12] are very useful, however are simplifications of the 
full formulas for FC and DT (assuming i£[W] 7^ 0) derived by Wald [H], which are: 

E\e hoEn \E n > 6} - 1 . . 

FC = 1 - ~ (91) 

E[e h o E «\E n >6]- E[e h o E «\E n < -9] 1 ' 

AT 

DT = -^(E[E n \E n >9](FC) + E[E n \E n <-e}(l-FC)) (92) 

Specifically, Equations [TT] and [12] hold under the assumption that the value of the state variable on 
the decision step is exactly equal to the decision threshold. In practice, however, this "no-overshoot" 
assumption may not provide a particularly good approximation. 

A correction term based on the mean of the overshoot distribution - that is, the distribution 
of the random variable defined by the excess distance over either the positive or negative threshold 
on the threshold crossing step - is suggested by Lee et al [27]. This correction is based on the 
Taylor expansion of the conditional expectations in Equation I9"TI and takes the form of a shift in 
the decision threshold. A correction of this form is relevant to our analysis, as the performance of 
two models are compared parametrically in the threshold to isolate the effects of the speed-accuracy 
tradeoff imparted by freely adjusting the threshold. 
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Denote the value of E n conditioned on crossing the first threshold as E n , and let X = E n — 
9 overshoot random variable, with mean ux- Expanding the conditional expectation (although 
dropping the conditional notation for convenience) via a Taylor series centered on this mean (the 
so-called delta method), we have 

E[e hoEn ] = E[e h ° r ° + h e h ° r °{E n - r„) + ^ ^ °> + ...] (93) 

Choosing tq = 9 yields an expression of Wald's truncation: 

E[e hoE n] = e h 6 A + hoE[x] + h 2 E[X 2 ] + \ (g4) 

« e h ° e (95) 

Here we see that if E n = 9, each term in the expansion becomes zero and Wald's approximation 
holds exactly. On the other hand, if E n overshoots 9, error will accumulate at each term in the 
expansion, as a function of the moments of the overshoot distribution. If instead the expansion 
is performed about ro = 9 + fi x , a threshold-shifted approximation expresses the truncation error 
terms of the second and higher centered moments of the overshoot distribution: 

E[e hoE n] = e h (9+^ x ) A + h 2 E[(X- /x x ) 2 ] + \ (g6) 



O h {e+E[x„]) 



(97) 



In practice, the overshoot distribution is often nonzero; however, if its mean can be calculated 
and ho < 0, the truncation error associated with the latter approximation might provide a more 
favorable approximation as long as the higher-order moments do not grow too large. For the 
decision time, using this alternative approximation is exactly correct, and results in no additional 
error. 



D Joint cumulants for the SIP and MIP model 

Staude et al. [39] suggest that cumulants provide a "natural and intuitive higher-order generaliza- 
tion of the covariance" for multineuron spiking. The two models of correlated activity examined 
here are indistinguishable when only examining first-order (i.e., mean firing rate) or second-order 
(i.e., pairwise correlations) statistics. Here, we derive the joint cumulants for each of these two 
models, to clarify how the spike count distributions produced by the two models differ at higher 
orders. 

The derivation relies on the conditional independence of the spike counts for each neuron in 
a pool, conditioned upon the spike count in the common spike train. Let Si...Sn be the random 
variables giving spike counts in a windows of size AT from each of the 1...N neurons in a corre- 
lated pool, and S be the spike count in the common spike train. The law of total cumulance [7] 
allows a relatively simple expression of the joint cumulant on k members S\...Sn (Because of the 
homogeneity of the pool, we will express the k th joint cumulant as calculated on Si-..Sk, but the 



26 



same expression holds for any fc-sized subset of Si...Sn): 

n(Si,...,Sk) = ^2 k (k(S Bi \S), ■-, n(S Bb \S) 



(98) 



vrgn 



Here II is the set of all partitions of {1...A;}, for example 



II[{1,2,3}] = {{{1},{2},{3}},{{1,2},{3}},{{1},{2,3}},{{1,3},{2}},{1,2,3}} (99) 
= {vri,7r2,7r3,7r4,7r 5 } (100) 

and k(Sb \S) is the conditional joint cumulant over the set of all spike counts indexed by an element 
of Bj — that is, the set {Sj : j € Bj,Bj G 7Tj}. 

In our special case, /c(S£. = whenever \Bj\ > 1, owing to the conditional independence of 
each neuron given the common spike train. Moreover, from the definition of the cumulant, the term 
of (|98p for the partition 7Tj that contains such a block Bj will also be zero. This implies that the 
only 7Tj € II that contributes in Equation [98] is tti = {{1}. ..{£;}} (i = 1 in the example of Equation 
fTOOl) : thus 

K (S 1 ,...,S k ) = K(E[S 1 \§\,...,E[S k \§\) = K k (E[S 1 \§\) t (101) 

where we have used the fact that the first cumulant is simply the expected value. Using the 
cumulant generating function, we then have a formula for the joint cumulant: 



k(Si, ...,S k ) 



d k 

w 



\o g E[e tE ^} 



t=o 



Thus, for the two models of correlations (assuming a firing rate A), we have: 



(102) 



MIP: E[Si\S = s] 



81=0 ^ Sl ^ 



s=0 



Si 



XAT(e pt - 1) 



P 



K(Sl...S k ) 



' XAT(e pt - 1) 



(dty 



t=0 



ATXp 



k-l 



(103) 
(104) 

(105) 
(106) 

(107) 
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SIP: E[S 1 \S = s] = Vsi- , U P)^ 1 *) = s + XAT{l-p) (108) 

^ (si-s)! 



S1=S 

bg^^ilM] = log f; e- AT ^(ATAp) 8 e , ( , +AAT(1 _ p)) 



S! 
s=0 



(109) 



= AAT(p[e* - i - 1] + 1) (110) 
K{S x ...S k ) = [AAT(p[e* - * - 1] + 1)] 



t=o 



XAT :k = l 
XATp : k > 1 



;i!2) 



Comparing Equations 11071 and 11121 (see also Figure [9]), we see agreement for k < 2 as expected; 
these correspond the the intended firing rate and pairwise covariance of neurons within the pool. 
However, for k > 2, we see the signature of the differences in the structure of the correlations. For 
the MIP model, the joint cumulant decays geometrically as more and more neurons are considered. 
In contrast, the joint cumulant remains constant for the SIP model. 
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